A Hybrid Approach to Extracting and Classifying Verb+Noun Constructions

نویسندگان

  • Amalia Todirascu-Courtier
  • Dan Tufis
  • Ulrich Heid
  • Christopher Gledhill
  • Dan Stefanescu
  • Marion Weller
  • François Rousselot
چکیده

We present the main findings and preliminary results of an ongoing project aimed at developing a system for collocation extraction based on contextual morpho-syntactic properties. We explored two hybrid extraction methods: the first method applies languageindepedent statistical techniques followed by a linguistic filtering, while the second approach, available only for German, is based on a set of lexico-syntactic patterns to extract collocation candidates. To define extraction and filtering patterns, we studied a specific collocation category, the Verb-Noun constructions, using a model inspired by the systemic functional grammar, proposing three level analysis: lexical, functional and semantic criteria. From tagged and lemmatized corpus, we identify some contextual morphosyntactic properties helping to filter the output of the statistical methods and to extract some potential interesting VN constructions (complex predicates vs complex predicator). The extracted candidates are validated and classified manually.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Mix Approach to Extracting and Classifying Verb+Noun Constructions

Amalia Todiraşcu, Dan Tufiş, Ulrich Heid, Christopher Gledhill, Dan Ştefanescu, Marion Weller, François Rousselot 1 LILPA, Université Marc Bloch, Strasbourg, France, {todiras, gledhill}@umb.u-strasbg.fr RACAI, Romanian Academy, Bucharest, Romania, {tufis, danstef}@racai.ro IMS Stuttgart, Universität Stuttgart, Germany, {uli, wellerm}@ims.uni-stuttgart.de INSA Strasbourg, France, Francois.Rousse...

متن کامل

Dependency Parsing for Identifying Hungarian Light Verb Constructions

Light verb constructions (LVCs) are verb and noun combinations in which the verb has lost its meaning to some degree and the noun is used in one of its original senses. They often share their syntactic pattern with other constructions (e.g. verbobject pairs) thus LVC detection can be viewed as classifying certain syntactic patterns as light verb constructions or not. In this paper, we explore a...

متن کامل

ConFarm: Extracting Surface Representations of Verb and Noun Constructions from Dependency Annotated Corpora of Russian

ConFarm is a web service dedicated to extraction of surface representations of verb and noun constructions from dependency annotated corpora of Russian texts. Currently, the extraction of constructions with a specific lemma from SynTagRus and Russian National Corpus is available. The system provides flexible interface that allows users to finetune the output. Extracted constructions are grouped...

متن کامل

Unsupervised Classification of Verb Noun Multi-Word Expression Tokens

We address the problem of classifying multiword expression tokens in running text. We focus our study on Verb-Noun Constructions (VNC) that vary in their idiomaticity depending on context. VNC tokens are classified as either idiomatic or literal. Our approach hinges upon the assumption that a literal VNC will have more in common with its component words than an idiomatic one. Commonality is mea...

متن کامل

Handling Sparsity for Verb Noun MWE Token Classification

We address the problem of classifying multiword expression tokens in running text. We focus our study on Verb-Noun Constructions (VNC) that vary in their idiomaticity depending on context. VNC tokens are classified as either idiomatic or literal. Our approach hinges upon the assumption that a literal VNC will have more in common with its component words than an idiomatic one. Commonality is mea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008